Carlina Feldmann
Lennart Oelschläger
Last rendered on 14.09.2022
=======Last rendered on 13.09.2022
>>>>>>> 0d887b63a19361b8869a370bf1dea2461c725816Welcome to this tiny course on data visualization in R with {ggplot2}! 👋
Potentially, plots can beautifully inform or horribly mislead. Colors and shape matter! ⚖️
The {ggplot2} package implements a grammar of graphics, a series of distinct tasks to make a graphic.
Being in decent control of {ggplot2} to produce meaningful plots.
Basic R skills + a not-too-old version of R (>= 4.0.0) + RStudio
I’m sure you have! Please leave a note here. 🙏
Load {ggplot2}.
We need data, let’s go with an excerpt from the famous Gapminder dataset:
<<<<<<< HEAD ======= >>>>>>> 0d887b63a19361b8869a370bf1dea2461c725816## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
<<<<<<< HEAD
=======
>>>>>>> 0d887b63a19361b8869a370bf1dea2461c725816
## tibble [1,704 × 6] (S3: tbl_df/tbl/data.frame)
## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num [1:1704] 28.8 30.3 32 34 36.1 ...
## $ pop : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
<<<<<<< HEAD
First, we tell the ggplot() function what data we use
and what variables we wish to see on each axis:
Something is missing … 🤔 We need an additional layer, a
geom_* function!
There are more of them which we can simply add (literally add!):
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p <- p + geom_point() + geom_smooth()
pAs a last polishing step for now, we improve the x-axis scale and the plot labels.
p + scale_x_log10(labels = scales::dollar) +
labs(x = "GDP per capita",
y = "Life expectancy in years",
title = "Economic growth as an indicator for life expectancy",
subtitle = "Data points are country-years",
caption = "Source: Gapminder")Finally, we can use the ggsave() function to save our
plot:
First, we tell the ggplot() function what data we use and what variables we wish to see on each axis:
Something is missing … 🤔 We need an additional layer, a geom_* function!
There are more of them which we can simply add (literally add!):
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p <- p + geom_point() + geom_smooth()
pAs a last polishing step for now, we improve the x-axis scale and the plot labels.
p + scale_x_log10(labels = scales::dollar) +
labs(x = "GDP per capita",
y = "Life expectancy in years",
title = "Economic growth as an indicator for life expectancy",
subtitle = "Data points are country-years",
caption = "Source: Gapminder")Finally, we can use the ggsave() function to save our plot:
ggplot()data = ...mapping = aes(...)geom_*() functionsThis course includes tutorials! 😎
<<<<<<< HEADExecuting the following lines gives you access to the course material:
# install.packages("devtools")
devtools::install_github("loelschlaeger/howtoggplot2")
library(howtoggplot2)To start the tutorial, type:
To open a copy of these slides, type:
To submit an issue on GitHub about this course, type:
Our goal is to plot the trajectory of life expectancy over time for each country in the gapminder data.
We must not forget to group by country! 💡
But can you make sense of this mess? Luckily, we can group by continents:
ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
geom_line(aes(group = country)) +
facet_wrap(~continent)Better don’t facet_wrap(~country)… Let’s polish our plot
with the things we already learned:
ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
geom_line(color = "grey", aes(group = country)) +
geom_smooth() +
facet_wrap(~continent) +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time on five continents")Notice that we supplied a formula to facet_wrap. This
can be more advanced, for example (with facet_grid):
ggplot(data = socviz::gss_sm, mapping = aes(x = age, y = childs)) +
geom_point(alpha = 0.2) +
geom_smooth() +
facet_grid(sex ~ race) +
labs(x = "Age",
y = "No. of children",
title = "Relationship between age and number of children",
subtitle = "Separated by sex (in rows) and race (in columns)")As a last input for this part, we learn four new geoms.
Executing the following lines gives you access to the course material:
# install.packages("devtools")
devtools::install_github("loelschlaeger/howtoggplot2")
library(howtoggplot2)To start the tutorial, type:
To open a copy of these slides, type:
To submit an issue on GitHub about this course, type:
Our goal is to plot the trajectory of life expectancy over time for each country in the gapminder data.
We must not forget to group by country! 💡
But can you make sense of this mess? Luckily, we can group by continents:
ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
geom_line(aes(group = country)) +
facet_wrap(~continent)Better don’t facet_wrap(~country)… Let’s polish our plot with the things we already learned:
ggplot(data = gapminder, mapping = aes(x = year, y = lifeExp)) +
geom_line(color = "grey", aes(group = country)) +
geom_smooth() +
facet_wrap(~continent) +
labs(x = "Year",
y = "Life expectancy",
title = "Life expectancy over time on five continents")Notice that we supplied a formula to facet_wrap. This can be more advanced, for example (with facet_grid):
ggplot(data = socviz::gss_sm, mapping = aes(x = age, y = childs)) +
geom_point(alpha = 0.2) +
geom_smooth() +
facet_grid(sex ~ race) +
labs(x = "Age",
y = "No. of children",
title = "Relationship between age and number of children",
subtitle = "Separated by sex (in rows) and race (in columns)")As a last input for this part, we learn four new geoms.
Using relative instead of absolute counts on the y-axis is covered in the tutorials.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 10 rows containing non-finite values (stat_bin).
We address the message and the warning in the tutorials.
##
## Attache Paket: 'dplyr'
## Die folgenden Objekte sind maskiert von 'package:stats':
##
## filter, lag
## Die folgenden Objekte sind maskiert von 'package:base':
##
## intersect, setdiff, setequal, union
ggplot(data = filter(gapminder, year == 2007),
mapping = aes(x = pop,
y = reorder(continent, pop))) +
geom_boxplot() +
scale_x_log10() +
labs(y = NULL,
x = "Populations in 2007")We look at a variant on the basic boxplot that {ggplot2} offers in the tutorials.
R can work with geographical data, and {ggplot2} can make choropleth maps.
world <- map_data("world")
p <- ggplot(data = world, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "white", color = "black")
plot(p)Instead of the default Mercator projection, we can use the Albers projection:
=======##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
ggplot(data = filter(gapminder, year == 2007),
mapping = aes(x = pop,
y = reorder(continent, pop))) +
geom_boxplot() +
scale_x_log10() +
labs(y = NULL,
x = "Populations in 2007")We look at a variant on the basic boxplot that {ggplot2} offers in the tutorials.
We can plot text annotations to plots via geom_text():
ggplot(data = socviz::elections_historic,
mapping = aes(x = popular_pct,
y = ec_pct,
label = winner_label)) +
geom_point() +
geom_text()This is hard to read. Adjusting the position is possible, but it is fuzzy and not robust. The extension {ggrepel} is designed to do this task for us:
ggplot(data = socviz::elections_historic,
mapping = aes(x = popular_pct,
y = ec_pct,
label = winner_label)) +
geom_point() +
ggrepel::geom_text_repel()## Warning: ggrepel: 7 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
We can also annotate only selected points (outliers for example) like follows:
ggplot(data = socviz::elections_historic,
mapping = aes(x = popular_pct,
y = ec_pct,
label = winner_label)) +
geom_point() +
ggrepel::geom_text_repel(
data = filter(socviz::elections_historic, popular_pct < 0.5 & ec_pct > 0.5)
) +
geom_hline(yintercept = 0.5) +
geom_vline(xintercept = 0.5)## Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
And finally, we can annotate anywhere almost everything we wish via annotate():
ggplot(data = socviz::elections_historic,
mapping = aes(x = popular_pct,
y = ec_pct,
label = winner_label)) +
geom_point() +
geom_hline(yintercept = 0.5) +
geom_vline(xintercept = 0.5) +
annotate(geom = "rect", xmin = 0, xmax = 0.5, ymin = 0, ymax = 0.5, fill = "red", alpha = 0.2) +
annotate(geom = "text", x = 0.25, y = 0.25, label = "Some text.")R can work with geographical data, and {ggplot2} can make choropleth maps.
world <- map_data("world")
p <- ggplot(data = world, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "white", color = "black")
plot(p)Instead of the default Mercator projection, we can use the Albers projection:
>>>>>>> 0d887b63a19361b8869a370bf1dea2461c725816Now in the tutorials, we will visualize the results of the Trump vs. Clinton election 2016 on a map of the US states.
Reproduce this plot!
If you want to see some hints, scroll down this page.
Don’t forget to install and load the packages {ggplot2} and {dplyr} and
load the gapminder dataset.
Hint 1: Use your {dplyr}-knowledge to create an extract of the gapminder dataset that only contains values from the year 2007.
Hint 2: Have a look at the 3rd slide of this presentation to copy the basic syntax and remember how to modify the labels.
Hint 3: You can set the size and colour of the points to depend on
certain variables in the aesthetics aes().
Hint 4: Have a look at ?guide to modify the legends.
{ggplot2} itself does not allow for interactive or animated visualizations.
However, there are of course R-packages to achieve this, e.g. {plotly}, {gganimate}, {shiny}
library(gganimate)
library(gifski)
p <- ggplot(gapminder, aes(x = gdpPercap, y=lifeExp, size = pop, colour = continent)) +
geom_point(alpha = 0.7) +
scale_x_log10(labels = scales::dollar) +
guides(size="none") +
guides(colour=guide_legend(title="")) +
labs(x = "GDP per capita", y = "Life expectancy in years",
title = "Economic growth as an indicator for life expectancy",
caption = "Source: Gapminder")
p + transition_time(year) +
labs(title = "Year: {frame_time}")